Exploring Complex Disease Gene Relationships Using Simultaneous Analysis
نویسندگان
چکیده
The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Adaptation of phylogenomic techniques to increasingly available genomic data provides an evolutionary perspective that may elucidate important unknown features of complex diseases. Here an automated method is presented that leverages publicly available genomic data and phylogenomic techniques. The approach is tested with nine genes implicated in the development of Alzheimer Disease, a complex neurodegenerative syndrome. The developed technique, implemented through a suite of Ruby scripts entitled “ASAP2,” first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network. This study demonstrates the potential for using automated simultaneous phylogenetic analysis to uncover previously unknown relationships among disease-associated genes that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide the first integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering around components of oxidative stress pathways. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.230v1 | CC-BY 4.0 Open Access | received: 1 Feb 2014, published: 1 Feb 2014 P re P rin ts Introduction Classical genetic diseases typically arise due to isolated genetic changes within a single gene or allele (Badano & Katsanis, 2002). Many of these “simple” or “monogenic” diseases follow Mendelian patterns of inheritance. The responsible genetic lesion is often the result of an insertion or deletion event, or the transversion / transposition of a nucleotide. The probability for transmission of simple genetic disorders may thus be easily predicted and generally follow sexlinked or autosomal patterns of heredity. Classic examples of monogenic disorders include cystic fibrosis, sickle cell anemia, and achondroplasia (Velinov et al., 1994; Kerem et al., 1989; Rees et al., 2010). By contrast, complex diseases or disorders may not follow clear hereditary patterns or be diagnosed based on isolated genetic lesions. However, many complex diseases such as cardiovascular disease, type 2 diabetes mellitus, and Alzheimer disease occur with higher frequency among families and close genetic relatives– suggesting that genetic factors may play a central role in their pathogenesis, beyond environmental or behavioral factors (Sillén et al., 2006). The risk of developing complex diseases or disorders and the future approaches for treating or preventing them may benefit from high-throughput, computational, or bioinformatics based approaches. For example, computational approaches, such as used in genome wide association studies, exome sequencing, proteomics, and microarray analyses, have shown great promise in recent years. Related advances in biotechnology have facilitated the identification of genotypes that may be factors involved in the heritability of complex genetic diseases (Yonan et al., 2003). For example, specific genotypes can be associated with a probabilistic value of susceptibility relative to the gene(s) they influence and thus correlated with a disease phenotype (Li et al., 2005; Newton-Cheh et al., 2009; Klein et al., 2012). PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.230v1 | CC-BY 4.0 Open Access | received: 1 Feb 2014, published: 1 Feb 2014 P re P rin ts Due to a lack of knowledge about the specific mechanisms by which multiple genetic factors may influence complex diseases, pharmacotherapies are often aimed at managing symptoms or laboratory values, and are thus reactionary and not curative. Often, the approach to complex disease management involves attempting environmental changes, such as can be conveyed through patient education or lifestyle modification, to reduce susceptibility in addition to pharmacotherapy (Estruch et al., 2013). A major current goal of biomedical research is therefore to better characterize the genetic factors that may contribute to developing complex diseases. The fact that the genetic environment influences susceptibility to complex disease implicates the structural or functional relationships between some or all members of a disease associated gene network in the development of the disease (Li et al., 2005). This relationship might be a direct physical interaction between the protein products of the genes, parallel functionality in metabolic pathways, or co-localization of protein products in a certain cell or tissue type (Li et al., 2005). These data are not easily elucidated using an experimental approach focused on a single gene or pathway and require a broader systems-based methodology. These types of relationships may be reflected in the evolutionary conservation of genes or gene groups among organisms with and without susceptibility to a given disease (Thornton & DeSalle, 2000; Watson et al., 2014). Mapping the evolutionary patterns of gene conservation or co-evolution associated with a complex disease may identify previously unknown clusters of genes or functional pathways that have impact on a disease process. Simultaneous Analysis Phylogenetic analyses infer potential evolutionary relationships based on similarities implying common descent from shared ancestry and are performed on data sets consisting of PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.230v1 | CC-BY 4.0 Open Access | received: 1 Feb 2014, published: 1 Feb 2014 P re P rin ts physical, functional, or molecular representations (Swiderski et al., 1998). Genomic analyses typically construct the analytic matrix using nucleotide or amino-acid sequences from different individuals or species (termed “taxa”; singular “taxon”). Classically, the resulting data are presented as trees where the branching points (termed “nodes”) give rise to hierarchical groupings of more similar taxa (akin to leaves on a branch). These trees can be used to explore potential patterns of divergence from a common ancestor as well as the degree of difference among taxa included in the tree. This degree of difference is usually described as an evolutionary “distance” that can be inferred multiple ways, but typically represents a measure of evolutionary change (based upon sequence differences) or an amount of time since divergence likely occurred (Zharkikh, 1994; Hedges et al., 2006). However, like experiments focused on a single gene or pathway, an isolated phylogenetic analysis may not capture important features of co-evolution or conservation of gene clusters impacting complex disease processes. Additionally, reliance on phylogenetic trees of individual genes may not fully address the potential for genetic changes such as lateral gene transfer, reversion of mutations, or recombination events (Dagan, 2011; Layeghifard et al., 2013). To account for multiple evolutionary patterns represented by multiple genes, data matrices can be combined into a single phylogenetic analysis through a “simultaneous analysis” (SA) approach (Nixon & Carpenter, 1996; Gatesy et al., 1999; Rokas et al., 2003). In SA, individual data blocks (e.g., a sequence matrix for a particular gene; referred to as a “partition”) are systematically combined to enable higher-order analyses that transcend data derived from analysis of an individual partition. Frequently, SA values are derived by applying arithmetic operations on other (already determined) SA values, so the workflow tends to follow a stepwise pattern. Previous studies have shown that SA techniques strengthen the overall support for the PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.230v1 | CC-BY 4.0 Open Access | received: 1 Feb 2014, published: 1 Feb 2014 P re P rin ts evolutionary patterns represented by trees determined by single partition phylogenetic analyses (Baker et al., 1998). In this study, a previous automated SA approach (Automated Simultaneous Analysis Phylogenetics; ASAP (Sarkar et al., 2008)) was refined to collect and analyze disease genes based on: (1) the degree of corroboration between partitions; and (2) the support for an overall consensus tree modeling a putative evolutionary relationship common to all partitions, using maximum parsimony analysis (Fitch, 1971). The final phase then generates a phylogenetic network based on the Robinson-Foulds tree similarity metric (Robinson & Foulds, 1981).
منابع مشابه
Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach
Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...
متن کاملGlobal gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملGlobal gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملCharacterization and Phylogenetic Analysis of Magnaporthe spp. strains on Various Hosts in Iran
Background: Populations of Magnaporthe, the causal agent of rice blast disease, are pathotypically and genetically diverse and therefore their interaction with different rice cultivars and also antagonistic microorganisms are very complicated. Objectives: The objectives of the present study were to characterize phylogenetic relationships of 114 native Magnaporthe strains, isolated from rice a...
متن کاملExploring the Impact of Topographical and Climate Factors on Generation of the Vulnerability-map of Leptospirosis
Leptospirosis is one of the most widespread zoonotic disease caused by Leptospira bacteria. It is found wherever human is in direct or indirect contact with Leptospira bacteria thorough infected animals as well as contaminated soil or water. The disease is mostly found in tropical, subtropical, hot, and humid areas. The main objectives of this study are to investigate the seasonality relatio...
متن کامل